A Study of Interestingness Measures for Associative Classification on Imbalanced Data

نویسندگان

  • Guangfei Yang
  • Xuejiao Cui
چکیده

Associative Classification (AC) is a well known tool in knowledge discovery and it has been proved to extract competitive classifiers. However, imbalanced data has posed a challenge for most classifier learn ing algorithms including AC methods. Because in the AC process, Interestingness Measure (IM) p lays an important role to generate interesting rules and build good classifiers, it is very important to select IMs for improving AC’s performance in the context of imbalanced data. In this paper, we aim at improving AC’s performance on imbalanced data through studying IMs. To achieve this, there are two main tasks to be settled. The first one is to find which measures have similar behaviors on imbalanced data. The second is to select appropriate measures. We evaluate each measure’s performance by AUC which is usually used for evaluation of imbalanced data classification. Firstly, based on the performances, we propose a frequent correlated patterns mining method to extract stable clusters in which the IMs have similar behaviors. Secondly, we find 26 proper measures for imbalanced data after the IM ranking computation method and divide them into two groups with one especially for extremely imbalanced data and the other suitable for slightly imbalanced data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

Role of Interestingness Measures in CAR Rule Ordering for Associative Classifier: An Empirical Approach

Associative Classifier is a novel technique which is the integration of Association Rule Mining and Classification. The difficult task in building Associative Classifier model is the selection of relevant rules from a large number of class association rules (CARs). A very popular method of ordering rules for selection is based on confidence, support and antecedent size (CSA). Other methods are ...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

Increasing the Interpretability of Rules Induced from Imbalanced Data by Using Bayesian Confirmation Measures

Approaches to support an interpretation of rules induced from imbalanced data are discussed. In this paper, the rule learning algorithm BRACID dedicated to class imbalance is considered. As it may induce too many rules, which hinders their interpretation, their filtering should be applied. We introduce three different post-pruning strategies, which aim at selecting rules having good descriptive...

متن کامل

Generic Associative Classification Rules: A Comparative Study

Associative classification is a supervised classification approach, integrating association mining and classification. Several studies in data mining have shown that associative classification achieves higher classification accuracy than do traditional classification techniques. However, the associative classification suffers from a major drawback: The huge number of the generated classificatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015